logo3.gif

VAST 2009 Challenge
Challenge 2 - Social Network and Geospatial

Authors and Affiliations:

Dr. Peter Bak, University of Konstanz, bak@dbvis.inf.uni-konstanz.de [PRIMARY contact]
Stefan Moritz Koch, University of Konstanz, stefan.2.koch@uni-konstanz.de [author,analyst]
Simon Butscher, University of Konstanz, simon.butscher@uni-konstanz.de [author, analyst]

 

Tool(s):

In order to solve the challenge we used a combination of tools. To preprocess the data, we relied on a small PHP script. To visualize the network data we used Pajek, a popular network analysis program [http://pajek.imfm.si/doku.php]. Also Pajek has a lot of functionality we used only a small part of it, mainly the force directed layout algorithms, the degree filter, measures like centrality, or walks with limited length. Beside these tools, we developed a small java tool to help us analyze the network data according to the constraints of the network structure that were given.

 

Video:

 

            Video.wmv

 

ANSWERS:


MC2.1: Which of the two social structures, A or B, most closely match the scenario you have identified in the data?

            A


MC2.2:  Provide the social network structure you have identified as a tab delimitated file. It should contain the employee, one or more handler, any middle folks, and the localized leader with their international contacts. What are the Flitter names of the persons involved? Please identify only key connections (not all single links for example) as well as any other nodes related to the scenario (if any) you may have discovered that were not described in the two scenarios A and B above.

            Filtter.txt


MC2.3:  Characterize the difference between your social network and the closest social structure you selected (A or B). If you include extra nodes please explain how they fit in to your scenario or analysis. 

1.      Approach

 

pipeline

    Figure 1 – The pipeline we used for our analysis: First a data selection and aggregation is made.
             After that there is an iterative visualization approach.

 

2.      Selection and Preprocessing

We started our analysis by getting familiar with the data and writing down the constraints for every scenario, breaking them up into sections that we judged as necessary, possible and merely speculative. The data was inserted into a MySQL Database using Navicat Lite. Then we designed an aggregated table with all the connection information that was given, e.g. the exact geo location on the map and the connection count of the users by using a small PHP script, which we wrote in 1 hour. The connection data itself was loaded into Pajek by using the txt2pajek helper tool.

We initially visualized the complete graph by using Pajek’s force directed layout algorithms and started to reduce the network into a Pajek partition in which vertices are colored according to the connection count. The result was still a much cluttered view so we decided to use more constraints to get rid of useless information.

To do that we first defined four classes (employee, handler, middleman and fearless leader) and assigned the persons to the classes according to their connection counts. Based on these classes we added further constraint first with SQL statements, and later we developed a lightweight java tool to structure the process of adding constraints.

 

3.      Visual Analytics Approach

At first the analysis was lead by the idea to concentrate on the scenario with more information available and easier constraints which is clearly scenario A. It appeared that scenario B was not supported directly by the data considering the fixed values for connection information of the middlemen (which would be 2-3 contacts). The only possibility according to scenario B was that the middlemen have contact to more than one of the handlers.

We concentrated on scenario A first and used the given constraints to reduce the dataset. The critical point was to check which user of the class employee had connection to at least 3 persons of the class handler and if all of the handlers had contact to someone with 4-5 contacts. The one with the codename Boris had also to have contact to the fearless leader having a connection count of over 100.

We wrote our java tool in an iterative process which took as about 6 hours. In each step of the process we added a new constraint and then visualized the results with the help of Pajek. Some constraints e.g. that the handlers are not allowed to communicate among themselves were not included, because this could be easily seen in the visualization. As a result we got exactly one network that matched the given constraints of scenario A.

The next step was to add the tool support for scenario B. We checked again which user of the class employee had connection to at least 3 persons of the class handler. But this time it was possible, that each handler has his own middleman with 2-4 contacts. These middlemen had to have contact to one potential leader. In the end we saw no evidence in the data, that scenario B would match.

By mapping the network structure on the map of Flovania, we realized that the fearless leader didn’t live in a larger city. But because this geospatial implication was mentioned in the task description we decided to validate the result again.

In order to do that, we used SQL statements and visualizations. We started again by looking which employee has connections to at least 3 handlers. This led to only 13 potential employees. Than we queried for the connections to potential handlers, middleman and leaders and visualized the result set for each potential employee separately.

In figure 2 you can see the visualization of the employee with the ID 19. This network structure nearly matches the constraints of scenario B. You can see four handlers connected to one employee. The drawback is that there are only two handlers whose two middlemen have contact to one leader. In our analysis we found no matching structure for scenario B at all.

19.bmp

Figure 2 – Network structure of employee with id 19

 

By visualizing the network for the employee with ID 100 (figure 3) it is easy to see that it fits the network structure of scenario A. There is one employee connected to 3 handlers and they are connected to one Middleman, who is related to the leader. This is the only matching structure we found in the data. This mainly manually made analysis for the 13 employees, took us about 2 hours.

 

100.bmp

Figure 3 – Network structure of employee with id 100

 

4.      Result

To visualize our final result we took the detected employee, the three handlers, the middleman and the fearless leader and queried for all connections between these persons. Also we added all international contacts of the fearless leader and the contact of the middleman Boris to the not jet mentioned member of the organization. In figure 2 you can see our final network, which seems to be the best matching for the task.

result2.png

Figure 4 - The complete resulting network of the criminal organization.

 

We believe that the person whose ID is 100 is the employee and the persons with the IDs 194, 261 and 563 are his handlers. As the three handlers have contact with only one person of the group of persons with 4 or 5 contacts, this person has to be the middleman Boris. Boris has the ID 4994. And also Boris has only one contact to the group of persons with over 100 contacts. There is a contact to the person whose ID is 4. This person seems to be the Fearless Leader. These entire IDs we get with the help of our own written tool. Furthermore we found one person who is linked with Boris and so it is very probable, that the person whose ID is 1612 is also a member of the organization.


MC2.4:  How is your hypothesis about the social structure in Part 1 supported by the city locations of Flovania? What part(s), if any, did the role of geographical information play in the social network of part one? 

Looking at the geo data, we can’t confirm suggestions that the leader lives in one of the bigger cities. Anyway we thing, that our result is clearly supported by the location of the members. In figure 5 you can see that the employee lives in the 2nd largest city just like his direct contacts. As we expect that the embassy is located in a bigger city (either Koul or Prounov), the habitation of the employee matches this scenario. Boris lives in a mid-sized city and the leader lives geographically separated from the rest of the network.

Because there weren’t many points to print on the map we did this manually with a drawing program. This took us about half an hour.

 


MC2.5:  In general, how are the Flitter users dispersed throughout the cities of this challenge? Which of the surrounding countries may have ties to this criminal operation?  Why might some be of more significant concern than others?

Generally the flitter users are dispersed throughout the cities proportionally to the inhabitants. But because we didn’t have exact information about the number of inhabitants we can’t be really sure if this is correct. We analyzed this by a simple SQL statement.

As you can see in figure 5 we can say that Posana may have stronger ties to the organization, because 50 percent of all international contacts live in Otello. But because the number of contacts to Posana,Trium and Transak differs only by 3 or 4 contacts we think that this is no significant concentration of contacts to Posana.

geo.png

Figure 5 – Geo-mapping of the suspect network